Market Basket Analysis Algorithm on Map/Reduce in AWS EC2
نویسنده
چکیده
As the web, social networking, and smartphone application have been popular, the data has grown drastically everyday. Thus, such data is called Big Data. Google met Big Data earlier than others and recognized the importance of the storage and computation of Big Data. Thus, Google implemented its parallel computing platform with Map/Reduce approach on Google Distributed File Systems (GFS) in order to compute Big Data. Map/Reduce motivates to redesign and convert the existing sequential algorithms to Map/Reduce algorithms for Big Data so that the paper presents Market Basket Analysis algorithm with Map/Reduce, one of popular data mining algorithms. The algorithm is to sort data set and to convert it to (key, value) pair to fit with Map/Reduce. Amazon Web Service (AWS) provides Apache Hadoop platform that provide Map/Reduce computing on Hadoop Distributed File Systems (HDFS) as one of many its services. In the paper, the proposed algorithm is executed on Amazon EC2 Map/Reduce platform with Hadoop. The experimental results show that the code with Map/Reduce increases the performance as adding more nodes but at a certain point, Map/Reduce has the limitation of exploring the parallelism with a bottle-neck that does not allow the performance gain. It is believed that the operations of distributing, aggregating, and reducing data in the nodes of Map/Reduce should cause the bottle-neck.
منابع مشابه
Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing
Map/Reduce approach has been popular in order to compute huge volumes of data since Google implemented its platform on Google Distributed File Systems (GFS) and then Amazon Web Service (AWS) provides its services with Apache Hadoop platform. Map/Reduce motivates to redesign and convert the existing sequential algorithms to Map/Reduce algorithms for big data so that the paper presents Market Bas...
متن کاملHeterogeneous Multi core processors for improving the efficiency of Market basket analysis algorithm in data mining
-Heterogeneous multi core processors can offer diverse computing capabilities. The efficiency of Market Basket Analysis Algorithm can be improved with heterogeneous multi core processors. Market basket analysis algorithm utilises apriori algorithm and is one of the popular data mining algorithms which can utilise Map/Reduce framework to perform analysis. The algorithm generates association rule...
متن کاملAdapting Self-Organizing Maps to the MapReduce Programming Paradigm
We present an adaption of the self organizing map (SOM) useful for cluster analysis of large quantities of data such as music classification or customer behavior analysis. The algorithm is based on the batch SOM formulation which has been successfully adopted to other parallel architectures and perfectly suits the map reduce programming paradigm, thus enabling the use of large cloud computing i...
متن کاملExperimental Study of Bidding Strategies for Scientific Workflows using AWS Spot Instances
Spot instance is an auction based Amazon Elastic Compute Cloud (EC2) instance provided by Amazon Web Service (AWS). It aims to help users to reduce their resource renting cost. The price for spot instances sometimes can be as low as one tenth of the price of the same type on demand instances. However, while gaining significantly cost savings on renting resources, users take risks on running ins...
متن کاملAn Application of Genetic Network Programming Model for Pricing of Basket Default Swaps (BDS)
The credit derivatives market has experienced remarkable growth over the past decade. As such, there is a growing interest in tools for pricing of the most prominent credit derivative, the credit default swap (CDS). In this paper, we propose a heuristic algorithm for pricing of basket default swaps (BDS). For this purpose, genetic network programming (GNP), which is one of the recent evolutiona...
متن کامل